CUDA 程式設計指南：架構範式：馮·諾伊曼對比哈佛

計算系統的基礎設計由處理單元與記憶體之間的關係所定義。主要區別在於指令與資料是否共用同一傳輸路徑，或使用獨立的通道。

一般用途系統（如 x86-64）採用此模型，其特點是統一的記憶體空間。中央處理器透過單一匯流排同時存取程式碼與資料，進而產生 馮·諾伊曼瓶頸：當中央處理器必須在取得指令與存取運算數之間切換匯流排時所產生的延遲。

常見於特殊用途處理器與 ARMv8-A L1 快取實作中，此設計使用物理上分離的記憶體儲存空間與訊號路徑。這使得操作碼與資料運算數可同時被讀取，顯著提升吞吐量。

流程圖：馮·諾伊曼架構中的記憶體讀取週期，顯示匯流排依序使用的狀況。

現代高性能計算系統通常採用 改良型哈佛架構。它們在 L1 快取層級表現得像哈佛機器（指令快取與資料快取分離），以最大化速度，同時在主記憶體層維持馮·諾伊曼模型，以確保程式的彈性。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the defining characteristic of the von Neumann Bottleneck?

The CPU speed is slower than the bus speed.

A single bus must alternate between fetching code and accessing data.

The memory capacity is too small for modern code.

The L1 cache and L2 cache use different voltages.

QUESTION 2

Which architecture is typically used for L1 cache implementations in ARMv8-A?

Pure von Neumann

Harvard Architecture

Stack-based Architecture

Single-Bus CISC

QUESTION 3

In a Modified Harvard Architecture, where does the 'von Neumann' aspect usually reside?

At the L1 Cache level

At the Main RAM/Global Memory level

Inside the Arithmetic Logic Unit

In the register file

QUESTION 4

What advantage does a von Neumann architecture provide to Just-In-Time (JIT) compilers?

It prevents memory fragmentation.

It treats written instructions exactly like data variables.

It allows for higher clock frequencies.

It automatically encrypts memory.

QUESTION 5

How many clock cycles are minimally required to fetch one instruction and one data operand in a pure Harvard architecture?

One cycle (Simultaneous fetch)

Two cycles (Sequential fetch)

Four cycles (Multiplexed fetch)

Zero cycles (Pre-cached)